Querying Linguistic Annotations
نویسندگان
چکیده
Over the past decade, a variety of expressive linguistic query languages have been developed. The most scalable of these have been implemented on top of an existing database engine. However, with the arrival of efficient, wide-coverage parsers, it is feasible to parse text on a scale that is several orders of magnitude larger. We show that the existing database approach will not scale up, and speculate on a new approach that leverages proximity search in the context of an IR engine. We also propose a simple syntax for querying linguistic annotations, avoiding the usability problems with existing tree query languages.
منابع مشابه
Indexing and Querying Linguistic Metadata and Document Content
The need for efficient corpus indexing and querying arises frequently both in machine learning-based and human-engineered natural language processing systems. This paper presents the ANNIC system, which can index documents not only by content, but also by their linguististic annotations and features. It also enables users to formulate versatile queries mixing keywords and linguistic information...
متن کاملKnowledge-based Multimodal Data Representation and Querying
This paper focuses on the representation and querying of knowledge-based multimodal data. Our work stands in the multidisciplinary project OTIM (Tools for Multimodal Annotation) dedicated to the development of tools for multimodal annotation of french conversational data. OTIM aims at encoding and manipulating annotations from all the linguistic domains in an unique framework. Defining a data m...
متن کاملQuerying Linguistic Trees
Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffe...
متن کاملIdentifying complex phenomena in a corpus via a treebank lens
While syntactically annotated corpora known as treebanks have been available for many years, along with a variety of customized tools for querying these annotations, the mapping from actual annotations to relevant syntactic or semantic phenomena has been obscured by the coarse-grained labelling of nodes in the parse trees which make up the treebanks. This lack of linguistic detail has hampered ...
متن کاملSemantic Technologies for Querying Linguistic Annotations: An Experiment Focusing on Graph-Structured Data
With growing interest in the creation and search of linguistic annotations that form general graphs (in contrast to formally simpler, rooted trees), there also is an increased need for infrastructures that support the exploration of such representations, for example logical-form meaning representations or semantic dependency graphs. In this work, we heavily lean on semantic technologies and in ...
متن کامل